Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System
نویسندگان
چکیده
A speech coding for a full-band speech analysis/synthesis system is described. In this work, full-band speech is defined as speech with a sampling frequency above 40 kHz, whose Nyquist frequency covers the audible frequency range. In prior works, speech coding has generally focused on the narrowband speech with a sampling frequency below 16 kHz. On the other hand, statistical parametric speech synthesis currently uses the full-band speech, and low-dimensional representation of speech parameters is being used. The purpose of this study is to achieve speech coding without deterioration for full-band speech. We focus on a high-quality speech analysis/synthesis system and mel-cepstral analysis using frequency warping. In the frequency warping function, we directly use three auditory scales. We carried out a subjective evaluation using the WORLD vocoder and found that the optimum number of dimensions was around 50. The kind of frequency warping did not significantly affect the sound quality in the dimensions.
منابع مشابه
Estimation of the spectral envelope of mixed spectrum
Speech modeling techniques used for analysis and synthesis usually rely on a source-lter representation where the source is a mixed spectrum signal, ie. one which consists of both sinusoidal and wide-band noise-like components. In such models, it is of prime importance to estimate a spectral envelope which represents the main features of the speech magnitude spectrum. In this paper, we introduc...
متن کاملWideband Harmonic Model: Alignment and Noise Modeling for High Quality Speech Synthesis
Speech sinusoidal modeling has been successfully applied to a broad range of speech analysis, synthesis and modification tasks. However, developing a high fidelity full band sinusoidal model that preserves its high quality on speech transformation still remains an open research problem. Such a system can be extremely useful for high quality speech synthesis. In this paper we present an enhanced...
متن کاملEntropy and Speech
In this thesis, we study the representation of speech signals and the estimation of information-theoretical measures from observations containing features of the speech signal. The main body of the thesis consists of four research papers. Paper A presents a compact representation of the speech signal that facilitates perfect reconstruction. The representation is constituted of models, model par...
متن کاملSpeech recognition with altered spectral distribution of envelope cues.
Recognition of consonants, vowels, and sentences was measured in conditions of reduced spectral resolution and distorted spectral distribution of temporal envelope cues. Speech materials were processed through four bandpass filters (analysis bands), half-wave rectified, and low-pass filtered to extract the temporal envelope from each band. The envelope from each speech band modulated a band-lim...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کامل